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Abstract 

We show that, for any c > 0, the (1+1) evolutionary algorithm using an arbi- 
trary mutation rate pn = c/n finds the optimum of a linear objective function over 
bit strings of length n in expected time Q{nlogn). Previously, this was only known 
for c < 1. Since previous work also shows that universal drift functions cannot 
exist for c larger than a certain constant, we instead define drift functions which 
depend crucially on the relevant objective functions (and also on c itself). Using 
these carefully-constructed drift functions, we prove that the expected optimisation 
time is Q{n\ogn). By giving an alternative proof of the multiplicative drift theorem, 
we also show that our optimisation-time bound holds with high probability. 

1 Introduction 

Drift analysis is central to the field of evolutionary algorithms. This type of analysis was 
implicit in the work of Droste, Jansen and Wegener [9], who analysed the optimisation of 
linear functions over bit strings by the classical (1+1) evolutionary algorithm ((1+1) EA) 
with mutation rate pn = l/n. The method was made explicit in the work of He and Yao 
who gave a simple, clean analysis. Later fundamental applications of drift analysis in 
the theory of evolutionary computation include [HI [121 IISl 1201 [22] • 

Recent work by Johannsen, Winzen and the first author [6l [7] shows that drift anal- 
ysis, as it is currently used, relies strongly on the fact that the mutation probabilities p„ 
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are relatively small. As He and Yao observed [T7], the analysis in [TB] only applies if 
the mutation probability pn is strictly smaller than 1/n, where n is the length of the 
bit strings of the search space. 

This restriction was improved in |18) , where a family of drift functions was presented 
that works for the most common mutation probability pn = l/raj^ However, as Doerr 
et al. have observed [7J, this family of drift functions still ceases to work for p„ > 4/n. 
Furthermore 0, if Pn > 4/n, then for any universal family of drift functions (from 
the class of log-of-linear functions) there is a linear objective function /, and a search 
space element x, such that the drift from x is negative (so the proof that the (1+1) 
EA converges quickly does not go through). Doerr et al. have also shown ^ that this 
problem cannot be fixed by applying the averaging approach of Jagerskiipper [19] — 
that approach fails for pn > 7/n. Thus, prior to the work presented here, it was an open 
problem whether the (1+1) EA minimises linear objective functions over bit strings in 
O(nlogn) time when the mutation probability is pn = c/n for c > 7. 

Our main result shows that this is the case. Since it is known that no universal family 
of drift functions exists, we instead manage to define a feasible family of drift functions 
in such a way that the drift function $j depends crucially on the objective function /. 
Using this idea, we show (see Theorem [7|) that, for any constant c, the (1+1) EA with 
mutation probability pn = c/n optimises any family of linear objective functions over 
bit strings in expected time 0(n log n). A corresponding lower bound follows easily from 
standard arguments, see Theorem [TUl Thus, our result is as good as possible (up to a 
constant factor). 

By reproving a multiplicative drift theorem (which was first used to analyse evolu- 
tionary algorithms in [7]), we also show that our bound on the optimisation time holds 
with high probability. The tail bounds in our drift theorem can also be used to show that 
many other known bounds on optimisation times also hold with high probability. This 
has been done for the (1+1) EA finding minimum spanning trees, computing shortest 
paths or Eulerian cycles in [3]. 

2 Drift Analysis 

In this section, we give a brief description of drift analysis, which is sufficient for our 
purposes. For a more general background to drift analysis, we refer to the papers cited 
above. 

2.1 The (1+1) evolutionary algorithm 

Let be a set of objective functions. Each / G F is associated with a problem size 
n{f) G N and is a function from the search space ilj to M-'^. Given /, the goal is to find 
an element x G ^2/ such that f{x) is minimised. Our assumption that the optimisation 

^Notc, though, that in that paper an EA only accepting strict improvements was analysed; this fact 
was exploited in the proof. We have little doubt, though, that their proof can be adapted to work also 
for the more common setting that also an offspring with equal fitness is accepted. 
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problem is minimisation (as opposed to maximisation) is without loss of generality, as 
is our assumption that the range of each objective function contains only non-negative 
numbers. For each objective function /, let ^optj denote the set of optimal search 

points — that is, those that minimise the value of /. 

Definition 1. We say that F is a family of objective functions over bit strings if, for 
every f ^ F, Qf = {0, l}"*^-^-'. In this case, an element x (z il.f is a string of n{f) bits, 

X = Xn(f) ...Xi. 

Definition 2. Suppose that F is a family of objective functions over hit strings. We say 
that F is linear if each f £ F is of the form f{x) = aiXi, where the coefficients 

ai are real numbers. Without loss of generality, we assume that aj+i > Oj > for all 
ie{l,...,n{f)-l}. 

Example 1. Suppose, for n € N, that fn '■ {0, 1} M-'^ is defined by fnixn ■ ■ ■ xi) = 
Y17=i^^~^^i- Then F = {fn} is a linear family of objective functions over bit strings. 
The value of fn{x) is the binary value of the bit string x = Xn ■ ■ ■ xi. 

Example 2. Suppose, for n G N, that /„, : {0, 1} — t- M-*^ is defined by fn{xn ■ ■ ■ xi) = 
Y17=i -^i- Then F = {fn} is a linear family of objective functions over bit strings. The 
value of f nix) is the number of ones in the bit string 

The randomised search heuristic that we study is the well-known (H-l) EA. To 
emphasize the role of the parameters, we refer to this algorithm as the (1+1) EA for 
minimising F. Given an objective function f £ F, this algorithm starts with an initial 
solution X, chosen uniformly at random from the search space Oj. In each iteration, 
from its existing solution x, it generates a new solution x' by mutation. 

Definition 3. Suppose F is a family of objective functions over hit strings and that pn S 
[0, 1] for n S N. In independent bit mutation, each bit of x is flipped independently 
with probability p„. In other words, for each i € {!,..., n} independently, we have 
Pr(x^ = 1 — Xi) = Pn and Pr(x^ = Xi) = 1 — pn. Often, pn = l/n, but we do not make 
this assumption. 

In the subsequent selection step, if f{x') < f{x), the EA accepts the solution x', 
meaning that the next iteration starts with Xnew := x' . Otherwise, the next iteration 
starts with Xnew := x. Since we are interested in determining the number of iterations 
that are necessary to find an optimal solution, we do not specify a termination criterion 
here. A pseudo-code description of the (1+1) EA is given in Algorithm [1] 

Note that the (1+1) EA is not typically used to solve difficult optimisation problems 
in practice. There are other, more complex, search heuristics which are better for such 
problems in practice. However, understanding the optimisation behaviour of the (1+1) 
EA often helps us to predict the optimisation behaviour of more complicated EAs (which 
are mostly too complex to allow rigorous theoretical analysis). As such, the (1+1) EA 
proved to be an important tool that attracted significant research efforts (see, e.g., [UlHllQ] 
for some early works). 
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Algorithm 1 The (1+1) EA for minimising F over bit strings with independent bit 

mutation 

1: Input an objective function f € F . 

2: Initialization: Choose x G {0, l}"'^-^) uniformly at random. 
3: repeat forever 

4: Create x' G {0, l}"^-'') by copymg X. 

5; Mutation: FHp each bit in x' independently with probability Pn(f)- 
6: Selection: if f{x') < f{x) then x := x' . 



2.2 A simple drift theorem with tail bounds 

The optimisation time of the (1+1) EA for minimising F is defined to be the number 
of times that the objective function is evaluated before the optimum is found. This 
is (apart from an additive deviation of one) equal to the number of mutation-selection 
iterations. Suppose that c is a positive constant and that F is a family of linear objective 
functions over bit strings. Our main result (Theorem [Tj) shows that the (1+1) EA for 
minimising F with independent bit-mutation rate pn = c/n has expected optimisation 
time 0(n(/) log n(/)). It also shows that, with high probability, the optimisation time 
is of this order of magnitude. 

In order to prove the main result, we introduce the notion of piece-wise polynomial 
drift. This will be explained in Section 12.51 In this section, we prepare the groundwork, 
by introducing the basic drift theorems that we will need. We start by defining the 
notion of a feasible family of drift functions. When feasible families of drift functions 
exist, they allow an elegant analysis yielding upper bounds for the optimisation time of 
EAs. 

Definition 4. Let : N — ?• M-'' he monotonically increasing and consider a family F of 
objective functions. For each f G F, let be a function from Qf to M-'^. We say that 
$ = {$j} is a v-feasihle family of drift functions for a (1 + 1) EA for minimising F, if 
there is an no G N such that, for every f £ F with n{f) > no, the following conditions 
are satisfied. 

1. (^f{x) = {^ for all X G Vioptj; 

2. ^f{x) > 1 for all x G fi/ \ ^optj; 

3. for all X £ Q,f\ J^optj, 

i^[<l>/(w)] < (l - ^) c^/(x), 

where, as above, we denote by x^^w the solution resulting from executing a single 
iteration (consisting of mutation and selection) with initial solution x. 

Here is a simple example. 
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Example 3. Fix a positive constant c. Let F be a linear family of objective functions 
over bit strings and consider the (1+1) EA for minimising F which uses independent bit 
mutation with pn = c/n. Suppose that, for each f ^ F, the coefficient oi is at least 1. 
Then the trivial family $ with = f is an {n/c') -feasible family of drift functions for 
this EA, where c' := c(l — {c/n))^~^ ~ ce~'^. However, as we shall see, this not often a 
very useful family of drift functions. 

The following well-known theorem (Theorem O below) shows how the optimisation 
time can be bomided using a drift fmiction. Similar arguments appear in the context 
of coupling proofs. See, for example, |10j, Section 5]. Much more is known about drift 
analysis. See, for example [H]. Note that Theorem [5] gives a probability tail bound in 
addition to an upper bound on the expected optimisation time. The tail bound is not 
new, but it seems to be unknown in the evolutionary algorithms literature. It can be 
applied to improve several previous results (see [4J). 

Theorem 5. Consider a family F of objective functions and a v-feasible family $ of drift 
functions for a (1+1) EA for minimising F. Let <I>max,/ denote max{<I>j(x) | x E f^/}- 
Then there is an ni € N such that, for every f ^ F with n[f) > ni, the expected 
optimisation time of the EA is at most 

z.(n(/))(ln$^ax,/ + l). 
Also, for any A > 0, the probability that the optimisation time exceeds 

rz.(n(/))(lnci>„,ax,/ + A)l 

is at most exp(— A)). 

Proof. Let no be the value from Definition 21 Definition H] rules out the possibility that 
max{z^(n) | n > no} < 1- Also, if max{z^(n) | n > no} = 1 then, from part (3) of the 
definition, £^[$j(xnew)] = so the optimisation time is 1. Suppose then, that there is an 
n G N such that z^(n) > 1. Let no be min{n € N | u{n) > 1} (actually, it would suffice 
to take Hq to be any member of this set, but, for concreteness, we take the minimum). 
Let ni = max(no,nQ). Now consider any f F with n(/) > ni and note that the first 
two conditions in Definition U] are satisfied. 

Let n = n(/). Fix an arbitrary initial solution xo G ^f- Consider starting the EA 
with this initial solution xq instead of choosing a random one. Denote by the value of 
$/(x) after t selection-mutation steps. Denote by Topt^xo the first time when the current 
solution X is optimal. Thus, from Definition |H *^*[Topta:Q] — 0' * < T'opt.xo) we have 

^[t] > 1- From the third condition in Definition HI 

< (1 - l/z.(n))*$[o] < (1 - l/i^(n))*$^ax,/ < exp(-t/z.(n))^>^,,j, 

where, in the last estimate, we used the well-known inequality 1 + z < e^ , which is valid 
for all zeR. 
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It is well known (see, for example [13| Problem 13(a), Section 3.11]) that if X is a 
random variable taking values in the non-negative integers, then E[X] = Yli^i Pi'(^ ^ 
i). Therefore, the expected optimisation time E[Topt,xo] can be written as 

^[^opt,x-o] = j;Pr(ropt,.o > i) = J]Pr(<I>[,] > 0). 

i>l t>0 

So, for any non-negative integer T, E[Topt^xo] ^ ^ + l^t>T -P^(*^[t] > 0)- Since, by 
Markov's inequality, Pr($[t] > 0) = Pr(^>[t] > 1) < 

i?[ropt,J<r + j;i?[<i>[,]]. 

t>T 

Now let T = [ln($inax,/)i^("-)l = ln(*l'max,/)'^(f^) + e for some < e < 1. By our 
upper bounds above, we obtain 

i?[ropt,.o] <T + {1- l/i.{n)f^^,,jYZo^'^ ~ 

Since i^{n) > 1, YliZoi^ ~ — ^{n). Plugging this in with the definition of T and 

using (1 - l/zy(n))''^(*--.^>(") < exp-''^^*--^/) = l/$max,/, 

^[Topt,xo] < H^m^.,Mn) + e + {l- l/v{n)fu{n) 

= v{n) (ln($^ax,/) + e/u{n) + (1 - l/u{n)Y) . 

We can now check, for every e € [0, 1], that £/v{n) + (1 — l/z^(n))^ < 1, as required. This 
is easiest seen by checking it for e = and e = 1 and noting that the term is convex 
in e. Finally, let T' := [(z^(n))(ln($max,/) + A)] for A > 0. We compute 

Pr(ropt,a.o > T') = Pr(«>[r'] > 0) < E[<^yr']] < exp{-T' /u{n))<l>^^,j < exp(-A). 

□ 

The proof above uses the argument -E[^>[j]] < (1 — l/z^(n))*$max,/- This had been 
used previously in the so-called methods of expected weight decrease |21) . There, however, 
it was followed up with a simple Markov inequality argument that led to a bound on 
the expected run-time that is weaker (by a constant factor) than what our drift theorem 
yields. Hence the main difference between the two approaches is that ours gives a better 
transformation of the drift of into a bound on ii^[min{t | < 1}]. Note, just 

to avoid misunderstandings, that typically ii^[min{t | < 1}] and min{t | < 1} 

are different quantities. 

Theorem [5] indicates that a family of drift function is better if the maximum val- 
ues $max,/ are small. In Example El taking <I>j = / only yields an upper bound 
0(n(/) log /max) for the expected optimisation time, where /max = max{/(x) | x € ^f}- 
This can be a weak bound. For example, applying it to the family F from Example [1] 
yields a bound 0{n{f)'^) for the expected optimisation time (which, as we shall see, is a 
weak bound). 
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2.3 Drift analysis for linear objective functions over bit strings 

The main goal of this paper is to analyse the optimisation time of the (1+1) EA for min- 
imising a linear family F of objective functions over bit strings, assuming independent-bit 
mutation with pn = c/n (for a fixed constant c). The reason for assuming pn = c/n is 
that results of Droste, Jansen and Wegener (Theorem 13 and 14 in [9]) show that this 
is the optimal order of magnitude. Since our objective is an 0(n(/) log n(/)) bound on 
optimisation time, we ease the language with the following definition. 

Definition 6. A feasible family of drift functions is a family of drift functions which is 
v-feasihle for a function v{n) = 0{n). 

Finding feasible drift functions is typically quite tricky. Doerr, Johannsen and 
Winzen built on earlier ideas of Droste, Jansen and Wegener [9j and He and Yao |18j in 
order to show that, for any linear family F of objective functions over bit strings, the 
family $ defined by 

Ln(/)/2j n(/) 
i=l i=[n(/)/2j+l 

is a feasible family of drift functions for the (1+1) EA for minimising F which uses 
independent bit mutation with pn = 1/n. (Thus, this suffices for the case c = 1.) 

This family $ = j^*/} is said to be a universal family of feasible drift functions 
because depends on n(/), but not otherwise on /. Since <^max,/ = 0('^(/)); this gives 
an expected optimisation time of 0(n(/) logn(/)), which is asymptotically optimal [9j. 
Proving that this $ is a feasible family, while not trivial, is not overly complicated. This 
discovery of a universal family of feasible drift functions gives an elegant analysis of the 
EA. 

Unfortunately, even if we allow <l*max,/ to grow faster than 0(n(/)), such universal 
families of feasible drift functions only exist when c is small (as noted in the introduction 
to this paper). For larger values of c, the function $j has to depend upon /. Prior to 
this paper, no non-trivial drift functions of this form were known, so it was an open 
problem whether the 0{n{f ) logn(/)) time bound also applies for c > 1. We show that 
this is the case. 

2.4 Our result 

Our main theorem is as follows. 

Theorem 7. Let c he a positive constant. Let F he a family of linear ohjective functions 
over bit strings. The (1+1) EA for minimising F with independent hit-mutation rate 
Pn = c/n has expected optimisation time 0(n(/) log n(/)). There is a constant k and 
a function v{n) = 0{n) such that, for any A > 0, the probahility that the optimisation 
time exceeds this hound hy kv{n)\ time steps is at most /cexp(— A). 

We prove Theorem [7] by constructing a feasible family of drift functions for the EA 
that is piece-wise polynomial (a notion that will be defined in Section l2.5p . Lemma [9] 
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extends Theorem [S] to piece- wise polynomial feasible families of drift functions, allowing 
us to prove Theorem [71 

Theorem[7]is interesting for two reasons. On the methodological side, the proof of the 
theorem greatly enlarges our understanding about how to choose good drift functions. 
This might enable better solutions for some problems where drift analysis has not yet 
been very successful. Examples are the minimum spanning tree problem [21] and the 
single-criteria formulation of the single-source shortest path problem [2]. For both prob- 
lems, the known bounds on the expected optimisation time contain a log(/max)-factor, 
stemming from the fact that, at least implicitly, drift analysis with the trivial family of 
drift functions with <I>j = / is conducted. 

Of course, our result is also interesting because it for the first time shows that linear 
functions are optimised by the (H-l) EA in time 0(n(/) log n(/)), regardless of what 
mutation probability pn = c/n is used. Note that this is not obvious. In the authors 
show that already for monotone functions, a constant factor change in the mutation 
probability can change the optimisation time from polynomial to exponential. 

2.5 Piece-wise polynomial drift 

Let F be a family of linear objective functions over bit strings. Let $ be a feasible family 
of drift functions for a (1-|-1)-EA for minimising F. 

We start with an elementary observation about which is that, in order to obtain 
an 0(n(/) logn(/)) bound on the expected optimisation time, we do not really need 
*^max,/ to be bounded from above by a polynomial in n{f) — we can afford to have a 
constant number of "huge jumps". The following arguments can be seen as a variation 
of the fitness level method |23j . 

Definition 8. Fix A; G N. Suppose that, for every f G F, = Mq , . . . , m[ is a 
partition of Q.f. Let M = {M.f \ f € F}. We say M. is a family of fitness-based 
^-partitions for F if for all f G F, 



Lemma 9. Let F be a family of linear objective functions over bit strings. Let ^ be a 
v-feasible family of drift functions for a (1+1 )-EA for minimising F. Let Ai be a family 
of fitness-based k-partitions for F. Then there is an ni G N such that, for every f (z F 
with n{f) > ui, the expected optimisation time of the EA is at most 



1. Ml = {0}, 

2. for all i < j, X & m/ and y G mJ, we have f{x) < f{y). 

We use the notation min<I>j(Mj ) to denote min{<I>j(x) | x G mJ } and the notation 



max<I>j(Mj ) to denote max{$j(x) | x G Mj}. 



k 
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Also, for any A > 0, the probability that the optimisation time exceeds 



I ^W)) (ln(max$/(M/ )) - ln(min$/(M/ ) + Aj 
is at most kexp{—X). 

Proof. Let ni be the quantity in Theorem [5] (which is at least as large as the quantity 
no in Definition HD. Let / e F with n{f) > m. For < j < k, let Qfj = ULo^/ 
let /i/j = min<I>/(Mj ). For 1 < j < k, define ^ fj : Qjj M as follows. If <I>/(x) > /x/j 
then = ^f{x)/^ifj. Otherwise, = 0. 

Now for j € {1, . . . ,k}, consider restricting the search space to ^fj- Since the 
partition M-^ is fitness based, we conclude that, if the EA is started with input /, and 
an initial solution in jj , all new solutions that are accepted by the EA are in Q fj . 

Considering all solutions in i^/j-i to be equivalent to the all-zero state 0, we note 
that {^f,j I / G F} satisfies the first two conditions of being a i/-feasible family of drift 
functions for F on {O/j}. Also, if$/(x) > fifj then f{x^cw)] < {'^-'^/'^{n{f)))^f{x) 
so 

F[f < F[f/(w)//u/j] < (1 - l/u{n{f)))^f,j{x). 
So, by Theorem O the expected time until a solution in f^fj-i is reached is at most 

z^(n(/))(l + lnmax{^/j(x) | x G f^/j}), 

which is at most 

v{n{f)) 1 + ln -^^\ ■ 

This gives the desired result, summing from j = k down to j = 1. 
For the high probability statement, again from Theorem \5\ we conclude that with 
probability at least 1 — exp(— A) , 

/ /max«>.(M/)\ \' 
\ \mm$j(M/)y J 

iterations suffice to go from a solution in Qjj to one in 0,fj-i. 

□ 

Definition 10. Suppose that ^ is a family of feasible drift functions for F . We will say 
that <I> is piece- wise polynomial (with respect to the (1+1)-EA), if there is a constant k 
and a family A4 of fitness based k-partitions for F such that for every j € {!,..., k}, 
ln(max$/(M/)) - ln(min^>/(M/)) = 0(logn(/)). 

If <1> is a family of feasible drift functions for a (l-l-l)-EA for minimising F, and ^ is 
piece-wise polynomial with respect to the EA, then the optimisation time bound given 
by Lemma [9] is 0(n(/) log n(/)). 
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3 Construction of the Drift Function 



Let F he a linear family of objective functions over bit strings (see Definition [2]) . Fix a 
constant c and consider the (1+1) EA for minimising F with independent bit-mutation 
rate pn = c/n. We aim to construct a family $ of feasible drift functions for the EA 
which is piece-wise polynomial with respect to the EA. 

3.1 Notation and parameters 

Recall from Definition [2] that = {0, 1}"(-^) and that an element x € fi/ is written as 
a string of n(/) bits, x = Xn[f) ■ ■ - xi. In the proof, we shall often use the word "left" to 
refer to the most-significant bit (with the largest index, index n(/)) of x and "right" to 
refer to the least-significant bit (with the smallest index, index 1). 

The proof will use several parameters, which we discuss here. We start by fixing 
an arbitrarily-small positive constant e. This is constant will be used to precisely for- 
mulate the intermediate results. To define the family we will use a sufficiently large 
constant K > 1 (depending on c and e) and a sufficiently small positive constant 7 
(depending on c, e and K) . 

3.2 Splitting into blocks 

The difficulty in defining a suitable drift function is that the optimisation of / via the 
EA heavily depends on the coefficients Oj. If these are steeply increasing, as in Example[Tl 
whether a new solution is accepted or not is determined by the value of the leftmost bit 
that is fiipped. On the other hand, if these are of comparable size, as in Example [2l the 
difference between the number of "good" bit-flips (turing a 1 into a 0) and the number 
of "bad" bit-flips (turing a into a 1) determines whether a new solution is accepted. 
Of course, the precise deflnitions of "steeply increasing" and "comparable size" depend 
on the constant c in the mutation probability. Also, an objective function / can be of 
a mixed type, having regions with steeply increasing coefficients and also regions where 
coefficients are of comparable size. 

Fix an objective function / with n[f) = n. To analyse / and define the corresponding 
drift function we split the bit positions {1, . . . , n} into blocks. The idea is that, within 
a block, one of the two behaviours is dominant. The definition of blocks, naturally, has 
to allow us to analyse the interaction between different blocks. 

We first split the bit positions {l,...,n} into miniblocks. Start with j = 1. A 
miniblock starting at bit position j is constructed as follows. If an/cij < n^, then 
{j, . . . , n} is a single miniblock. Otherwise, let i be the minimum value in {j + 1, . . . ,n} 
such that ai/aj > n^. Then the set {j, ... ,z} is a miniblock. If i = n, we are flnished. 
Otherwise, set j = i and repeat to form the next miniblock, starting at bit position j. 
Note that consecutive miniblocks overlap by one bit position. 

The next thing that we do is merge consecutive pairs of miniblocks into blocks. To 
start out with, we just go through the miniblocks from right to left, making a block out 
of each pair of miniblocks. Note that this is (intentionally) different from just deflning 
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blocks analogous to miniblocks with the replaced by n^. Note further that again 
consecutive blocks overlap in one bit position. 

A block is said to be long if it contains at least 771 bit positions (recall that the 
parameter 7 is from Section 13. ip and short otherwise. It helps our analysis if any pair 
of long blocks has at least three short blocks in between. So if two long blocks are 
separated by at most two short blocks, then we combine the whole thing into a single 
long block. We repeat this (at most a constant number of times since there are less than 
1 /7 long blocks initially) until all remaining long blocks are separated by at least three 
short blocks. 

We will use £5 to denote the leftmost bit position in block B and to denote the 
rightmost bit position in block B. As long as B is not the leftmost block, we have 

3.3 Definition of 

We will define weights wi, . . . ,Wn & ^ such that $/(x) = Y17=i '^i^i- We call the Wi 
weights to distinguish them from the coefficients ai, . . . ,an of f. 

We define the weights wi, . . . ,Wn as follows, starting with wi = 1. Suppose that bit 
position i is in block B, that i ^ rs, and that Wrg is already defined. If block i? is a long 
block, or is immediately to the left of a long block, then we define Wi by Wi = Wj-g / Org ■ 
We call this the copy regime since Wi/wrg = ai/org. Otherwise, we are in the damped 
regime and we define Wi by 

Wi = Wrg mm{K^'-'^^^/^,ai/arg}, 

where K is the parameter from Section [3.11 

It will be a major effort in the remainder of the paper to show that this {^f \ f € F} 
is a feasible family of drift functions for the EA. It is easier to see that {^f} is piece- wise 
polynomial with respect to the EA, so we do this next. 

Lemma 11. Let F be a linear family of objective functions over bit strings. Consider the 
(1 + 1) EA for minimising F with independent bit-mutation rate Pn = c/n. The family 
$ = {$/} of drift functions constructed above is piece-wise polynomial with respect to 
the EA. 

Proof. Let k = 6[l/7] + 1. We now construct a family of fitness-based /c-partitions 
for F. 

Let / be an objective function in F and let n = n{f). We now define the partition 
A4^ . We call a bit position i € [2..n] a jump (for the objective function /) if 

• i is in a copy regime, and 

• Wi/wi-i > n^. 

By the construction of the blocks, bit position i is the leftmost bit position of a miniblock 
contained in either (1) a long block, or (2) a short block immediately to the left of a 
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long block. Since there are at most [I/7] long blocks, there are at most k — 1 jumps. 
(The easiest way to see this is to think about the original long blocks, prior to any 
merges. Each block contains two miniblocks. Within a long block B, there may be two 
jumps, and there may be two in each of the two blocks to the left of -B — the block 
immediately to the left of B is always in the copy regime, but the block to its left may 
also be merged into a long block with B.) Suppose there are k' jumps, and let A^J = 0, 
for k' + 1 < j < k. 

Let ii, . . . , ifc/ be an increasing enumeration of the jumps. Set io = 1 and ik'+i = n+1 
to ease the following definition. For j = 1, . . . ,k' + 1, let Nj be {ij-i, . . . ,ij — 1} and 
define 

M^j = {x G {0, 1}'' \3i£ Nj :xi = lAyi> ij : xt = 0}. 

Let Mq = {0}. Informally, Nj is the set of bit positions starting at the jump ij-i and 
going up to, but not including, the jump ij. So {Nj | 1 < j < A;' + 1} is a partition of 
the bit positions. Then A4^ is the set of bit strings x which have the leftmost "l"-bit in 

In order to show that M = {M-l^ \ f G F} is a family of fitness-based A;-partitions 
for F, we need only show that the following condition is satisfied: for all i < j, x G m/ 
and y G JVlj , we have f{x) < f{y). The condition follows from the fact that a-i/ai^i = 
Wi/wi^i > v? for all jumps i. 

In order to show that $ is piece- wise polynomial with respect to the EA, it remains to 
prove that, for every j G {1, . . . , /c}, ln(max<I>j(Mj )) — ln(min<I>j(Mj)) = 0(logn(/)). 

Fix any such j. Let J'/ = max ^f{Mj ) / min ^f{Mj). We show that ?"/ is upper-bounded 
by a polynomial in n. 

For a set of bit positions / C {1, . . . , n}, let mini denote the minimum element in / 
and let max/ denote the maximum element. Since wi < ... < Wn, min 

tt'minAfj = 'Wi^_-^. Similarly, max$/(Mj) = Ylt^i'^' '^i < nWma.^N, = nwi^-i. Hence 

r/ < nu;max Nj I Wmin Nj ■ 

We rewrite 

where B runs over all miniblocks that have a non-empty intersection with Nj . Note that 
the above is true because adjacent miniblocks intersect in exactly one bit position. 

If B is a miniblock in a damped regime, then Wjnax{BnNj)/wjnm{BnNj) < wIbI'^tb = 
j{{^B-rB)c/n ^ In consequence, the contribution of all weights in damped regimes to ([1]) 
is at most a factor K'^. 

What remains is the contribution of miniblocks in long blocks and in those short 
blocks immediately to the left of a long block. Let B be such a miniblock. If S n Nj = 
{£b} then u7max(BnAfj)/^«min(BnAfj) = 1- Otherwise, note that 

Wma.x{BnNj) ^ Wig _ ( Wig \ ( Wlj^ 
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The first factor is at most n^, since is not a jump, tlie second factor is at most 
W£g-i/wrg = aig^i/a-rg < by the definition of a miniblock. □ 

3.4 Auxiliary results concerning the weights Wi 

Fix an objective function f G F and let n = n{f). We wih assume that n is sufficiently 
large with respect to the constants c, e, K and 7 since our objective is to construct 
a family $ of feasible drift functions for the EA and the definition of such a family 
(Definition |4]) is only concerned with sufficiently large n. The definition of allows us 
to prove a number of useful facts. The first of these uses a geometric series to bound 
sums of weights in the damped regime. 

Lemma 12. Let Bq, . . . , be a consecutive sequence of blocks (left to right) in the 
damped regime with £bo = + t- Then 

J - "^B^ \clnK 

ieBoU...uBfe 

Proof For < /i < t we have w^s^-/,, < K^^'/'^WrB^K-^''/'' . Now 

00 ^ 

j£BoU...UBk h=0 

Now K"/'^ = e(i'i^)^/" > 1 + (In A")c/n, so 

1 1 / n 

< z , = ^^ + 1 



1 _ i;^-^/" - 1 - -f-TTTTT^ \clnK 

l+(lnK)c/n 



□ 



The next lemma gives the relationship between the leftmost weight and the rightmost 
weight in a block in the damped regime. 

Lemma 13. If B is a block in the damped regime with is = fB + t and B is not the 
leftmost block, then W£„ = K*'^/'^ 



Wr 



Proof. This follows from the definition of the weights in the damped regime, since 

OifB 

The second inequality follows from our assumption (at the beginning of Section [331) that 
n is sufficiently large with respect to K and c. □ 

Lemmas [12] and [13] give the following corollary. 
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Corollary 14. Let Bq, . . . ,Bk he a consecutive sequence of blocks (left to right) in the 
damped regime with £bo = i^B^ + ^- If Bq is not the leftmost block then 

EWj < Wi„ ( — — -- + 1 ) . 
^ - ^0 \clnK J 

ieSoU...uBfe 

Corollary 1141 gives the following upper bound for the sum of all weights contained in, 
and to the right of, a short block. 

Lemma 15. Let B be a short block that is not the leftmost block. Then 



Proof. If there is no long block to the right of then B and all of the blocks to its right 
are in the damped regime, so the result follows immediately from Corollary 1141 Assume 
therefore that there is a long block to the right of B. Let L be the long block which is 
closest to B on its right. Let S be the short block immediately to the left of L. Note 
that S might be the same block as B. 

Suppose j € L. Recall that for all /i, € -L U 5, we haye ^ = Thus, since S is 
not the leftmost block, 

^ '^j -4 -4 ^ -4 

Wj = —Uj < —n a£„ = n Wi„ < n wi„. 

Qj Oj 

Since the Wj's increase with j, we conclude that wj < n~'^weg for any j < l^. Thus, 

Using the fact that S is short and the monotonicity of w, we deduce J2jes'^j — 
^nwig . Combining this with Corollary [T^ we obtain 



3 



□ 



4 Feasible Drift 

Our objectiye in this section is to prove the following lemma, which is the heart of the 
proof of our main result. 

Lemma 16. Let F be a linear family of objective functions over bit strings. Consider 
the (1+1) EA for minimising F with independent bit-mutation rate pn = c/n. There is 
a function v[n) = 0{n) such that the family <5 = of drift functions constructed 

above is u -feasible for the EA. 
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Consider running the EA with input / with n = n{f). We use the following notation. 
The state after t steps is a binary string x[t] = Xn[t] ■ ■ ■ xi [t]. Recall from Section [3TT] that 
we write bit strings as words from most significant bit ("leftmost bit") to least significant. 
In the {t + l)'st step of the algorithm, the bits of a binary string y[t + 1] = yn[t + 
1] ... 7/1 [t + 1] encoding the mutation mask are chosen independently. The probability 
that + 1] = 1 is p„ = c/n. Then x'[t + 1] is formed from x[t] by flipping the bits that 
are 1 in string That is, x'j^[t+l] . ..x[[t+l] = (x„[t]©y„[t+l]) . . . (xi[t]eyi[t+l]). 

Let At+i be the event that "^iCiiX^lt + 1] < ^jajXj[t]. We say that the mutation in 
step t + 1 is "accepted" in this case. If At+i occurs, then x[t + 1] = + 1]. Otherwise, 
x[t + 1] = x[t]. Of course, the coefficients Oj, and therefore At+i itself, depends implicitly 
on /. Suppose that x[t] is not the all-zero string. For a bit position i with Xi[t] = 1, let 
Ii[t + 1] be the event 

y^[t + l] = 1 AVj G {i + l,...,n} : {xj[t] = 1) ^ (^^.[t + 1] = 0). 

Ii[t + 1] is the event that i is the leftmost '1' to be considered for a fiip in step t + 1. 
Finally, let I'^ [t + 1] be the event 

Vj G + 1, . . . , n} : {xj[t] = 0) ^ + 1] = 0). 

I'^[t + 1] is the event that the '0' bits to the left of £ are not considered for a flip in 
step t + 1. Note that Pr(/^[t + 1]) > (1 - PnT and that, given x[t], the event /^[t + 1] 
is independent of Ii[t + 1] for any i (the event Ii[t + 1] constrains yj[t + 1] for some j 
with Xj[t] = 1, whereas the event I'^[t + 1] constrains yj[t + 1] for j with Xj[t] = 0). 
However, these events are not independent if we condition on At+i, as the following 
simple observation shows. 

Lemma 17. Let i be a bit position contained in some block B. Assume that there is a 
block L immediately to the left of B. Then Ii[t + 1] and At-\-i implies I'l^^it + !]• 

Proof. There is nothing to show if L is the leftmost block. Hence assume that it is not. 
Then in particular, a^^ > n'^a^j^. 

Assume that Ii\t + 1] occurs and I[^\t + 1] does not. Let /c > be such that 
yk[t+l] = 1 andxfc[t] = 0. Then X]"=i aj(2;j[i+l] -a^j [t]) > ak-J2j<i'^j ^ ak-noi > 0, 
because > a^^ > n^Or^ > n^Oj. Hence this mutation is not accepted, that is, At+i 
does not occur. □ 

Recall that e G (0, 1), K, and 7 are parameters defined in Section [3Tl We take e to 
be "sufficiently small" . Then > 1 is taken to be "sufficiently large" (depending on c 
and e) and then 7 G (0, 1) is taken to be "sufficiently small" (depending on c, e and 
K). Finally, we take no > 1 to be any integer which is "sufficiently large" with respect 
to all of these parameters. The actual constraints that we use (to determine what is 
"sufficiently large" and what is "sufficiently small" ) will be spelled out below. Note that 
(1 — approaches exp(— c) from below as n — > cxd. We choose no so that (1 — ;^)"° 
is "sufficiently close" to exp(— c) (with respect to c, e and K). We can conclude from 
this that (1 — is "sufficiently close" to exp(— c) for any n > no. Similarly, (1 — ^)^" 
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approaches exp(— 3c) from below as n — )■ oo. We will choose hq to ensure that, for 
n > no, this is "sufficiently close" to exp(— 3c). 

Proof of Lemma \16l The first two conditions in Definition H] follow from the construction 
of in Section [3.31 The third condition follows from Lemma [TH] below. □ 

The following lemma is the main ingredient in the short proof of Lemma [16] above. 
It establishes the third condition in Definition [U so it allows us to conclude that <I> is 
i^- feasible for the EA. Since by Lemma \TT\ <I> is also piece- wise polynomial with respect 
to the EA, Lemma [9] enables us to repeatedly apply Lemma [T8l to bound the expected 
optimisation time of the EA. 

Lemma 18. Let F be a linear family of objective functions over bit strings. Consider 
the (1+1) EA for minimising F with independent bit-mutation rate Pn = c/n. Let f be 
an objective function in F with n{f) > uq. For all x € {0, 1}"(-^) \ {0}, 

E[^f{x[t + 1]) I x[t] =x]<{l- ^ce-^'il - ef) $/(x). 

Proof. Fix f € F with n(/) > no- Let n = n{f). Note that, for any fixed x[t], 

E[^j{x[t])-<s>fix[t+i])]= Yl pmt+im^fix[t])-^fix[t+i]) \ i,[t+i]], (2) 

i:Xi[t] = l 

since the events Ii[t + 1] for 1 < i < n are disjoint and <I>j(x[i]) = + 1]) unless 

one of them occurs. In each of various cases (see Subsections 14.11 to 14. 5p . we will show 
that, for all i with Xi[t] = 1, 

E[^f{x[t])-<^f{x[t + l]) \L,[t + l]] > (l-p„)2«u;.(i_e), (3) 

which is greater than or equal to since n > uq > c and e < 1. Using the lower bound 
Pr(/j[t + l]) > Pn(l— Pn)", which applies for every i with Xi[t] = 1, Equations ([2]) and ([3]) 
give 

E[^j{x[t])-^f{x[t+l])] >Pn{l-Pnr E mfi4t])-^Mt + l])\h[t+l]] 

i:Xi[t]=l 

>Pnil-Pnni-Pn?''{l-e)^f{x[t]), 

SO 

E[^f{x[t + 1])] < (1 -p„(l -p„)3«(i _ e))<^f{x[t]). 

Since (1 — Pn)^^ > e~^'^(l — e) for n > no, this will complete the proof. 

It remains to prove Equation Q. We do this in Subsection 14. II to 14.51 In each case, 
B is the block containing bit position i, L is the block to the left of B (if it exists) and R 
is the block to the right of B (if it exists). Figure [1] depicts some blocks (two short blocks 
followed by a long block, followed by a short block divided into two miniblocks, followed 
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Case 1 


Case 3 


Case 5 


Case 4 1 Case 2 


Case 1 



Figure 1: The cases that are used to proof Equation 



by another short block). For each possible location of the bit position i, it names the 
relevant case. Every long block is covered by Case 5. Blocks to the left of a long block 
are covered by Case 3 and blocks immediately to the right of a long block are covered 
by Case 4, then Case 2. Everything else is covered by Case 1. 

□ 

For all of the following cases, fix / G F with n{f) > uq. Let n = n{f). Fix x[t] with 
Xi[t] = 1 for a bit position i in block B. Recall from the proof of Lemma [T8l that the 
goal is to prove ([3]). That is, we must show that 

E[^f{x[t])-<^f{x[t + l]) \Ii[t + l]] > {l-pnf''wi{l-e). 

4.1 Case 1 

For this case, assume that B is not long and that blocks adjacent to B are not long 
either. 

If B is not the leftmost block, then let L be the block to B^s left. The case in which 
B is the leftmost block is actually easier, but to avoid repetition, in this case, let L be 
the block consisting of the single bit position is- The following argument now applies 
whether L is a real block or just a single bit position. 

We will condition on Ii[t+1]. By LemmafTTl we know that if this mutation is accepted 
{so At+i occurs), then the event I'l^^^f^i] occurs. Also, Pr(/^^[t + l] [ Ii[t + 1]) > (1— p^)", 
as we noted earlier. Thus -E[<l>j(x[t]) — ^f{x[t + 1]) [ Ii[t + 1]] is equal to 

Pr(4[t + 1] I I,[t + 1]) • E[^j{x[t]) - ^j{x[t + 1]) I I,[t + l],4[i + 1]]. (4) 

Let P = PT{At+i I Ii[t + l],r^^[t + l]). Note that P > (1 - PnT (since, for example, At+i 
occurs iiyj[t + l] = for j / i). Now $/(x[t]) - ^>/(x[t + 1]) = Yl]=iWjixj[A- Xj[t + l]). 
If Ii[t + 1] and l£^[t + 1] occur, then this is Ylj<ii^ ~ + !])• If ^t+i also 

occurs, then Xj [t] — Xi[t + 1] = 1 so this is Wi + Ylij<ii^ j^i (^i ~ + 1] ) • Thus, the 
quantity in 1^ is at least 



(l-PnTiwiP- Yl w,Fiiyj[t + l] = l\Ut + l],l',Jt + l]) 

> (1 -Pn)" Wi{l-pnT - Yl ^J^" • 
V j<iL J 
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Now, by Lemma [TCI we have 



E 



2n 



h 2 + 7n + n 



> {l-pnTwi 



rlnK 

To see this, apply the lemma directly to L if it is not the leftmost block (and note 
that W£j^ < K'^'^^Wrg)- If L is the leftmost block (and B is not) then apply Lemma [15] 
to block B (noting that wi^ < K^'^Wrg) and use Lemma [T2] to sum the weights in L. 
Finally, if B is the leftmost block then apply Lemma [15] to the short block to the right 
of B and use Lemma [T2] to sum the weights in B. 
Using this and Wrg < Wi we have 

E[^j{x[t])-^f{x[t + l])\h[t + l\] 

:i - PnT - - 2-^'^=^ - IcK^'-" - -^K^A . 

InA n J 

By the choice of the parameters in Section vaAX and since n > no, each of ^^^^ , 
2£^2c7^ 7ciC2c7 ■i^K'^'^'^ is at most (1 -p„)"e/4, so Equation dSj) holds, as required. 
To see this, recall (from the text just after Lemma [T7I) that e is taken to be "sufficiently 
small" , then A' > 1 is taken to be "sufficiently large" (depending on c and e) and then 
7 G (0,1) is taken to be "sufficiently small" (depending on c, e and K). Finally, we 
take no > 1 to be any integer which is "sufficiently large" with respect to all of these 
parameters, in particular, guaranteeing that (1 — Pn)"" is "sufficiently close" to exp(— c) 
for any n > no. It is easy to see that ^^K'^'^^ and 2^K'^'^'^ are sufficiently small, since 
no is chosen after the other parameters (so these terms can be made arbitrarily small 
as compared to exp(— c)e/4). Similarly, jcK'^'^'^ is sufficiently small because 7 is chosen 
to be sufficiently small with respect to e, c and K. Finally, ^^7^ is sufficiently small 
because 7 can be chosen as small as we like with respect to the other parameters. (That 
is, first K is made sufficiently large with respect to c and e and then 7 is defined.) For 
example, setting 7 = ln(^e~^lni^)/(2clnK) gives = e~'^e/8. 

4.2 Case 2 

For this case, assume that the block L, immediately to the left of B, is long, and that i 
is in the rightmost miniblock of block B (which is therefore short). 

This is very similar to Case 1. As in Case 1, we will condition on Ii\t + 1]. Where 
Case 1 uses Lemma [T71 we use exactly the same argument to show that, if this mutation 
is accepted (so At-\-i occurs), then event -^^^[j+i] occurs. From that point the argument 
proceeds exactly as in Case 1, replacing "^j;," with "£b"- We use Lemma [15] to obtain 
the upper bound 



n 



< K'^'^'wrr, ——F7 + 1 + 7"- + n^ 
Vein A 

The rest of the argument is exactly the same as in Case 1. 



-3 
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4.3 Case 3 

For this case, assume that B is immediately to the left of a long block R. Hence both 
B and R are in the copy regime. 

If B is not the leftmost block, then there is a block L immediately to the left of B. 
Block L is short, since any pair of long blocks has at three short blocks between. Thus, 
L is in the damped regime. If B is the leftmost block, to keep notation simple, we add 
an artificial block L = {is} = {n}. 

Note that 

Wj < nw£„ = nwe„ — — <n Wi. (5) 

Let Y be the set of n-bit binary strings so that, if y[t + 1] = y, then Ii[t + 1] occurs 

and At-\-i occurs (the move in step t + 1 is accepted). We first analyse the effect of such 
a mutation. Let y . As in Case 1, ^It+i implies I'i^\t + 1]. Consequently, we have 
yj = for all j that fullfill j > £l oi both j > i and Xj[t] = 1. Thus, by the definition 
of Af+i, we have 

J2 aj{ixj[t]®yj)-Xj[t])<0. 

3<e-L 

We compute 

j^L:yj=l,Xj[t]=Q jeB\jR:yj = l,Xj[t\=0 j^BUR:yj=l,Xj[t]=l j<rR 

Dividing through by Oj, we have 

j&L:yj=l,Xj[t]=Q * j&BVjR:yj=l,Xj[t]=Q * j&BVjR:yj=l,Xj[t\=l * 
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Now for j in the copy regime (blocks B and R), aj/ai = Wj/wi. Also, for j € L (which 
is in the damped regime), 

Wj = min(iC^*~''^'*'^/"', aj/a^^) < Wrj^—^ = Wr^ — —^ = Wi — , 

SO aj/ai > Wj/wi- Hence, replacing aj/ai with Wj/wi and multiplying through by Wi, 
we have 

j£L:yj=l,Xj[t]=0 jeBUR:yj=l,Xj[t]=0 jeBUR:yj=l,Xj[t]=l 

For the mutation being random (but conditioning on 1] and /^^ [t + 1]), we com- 
pute the following. Let Ei = E ^j^LuBuR'^ji^ji^] ~ + 1]) I -^4* + + 1] 
Then 

E[^f{x[t]) - ^f{x[t + 1]) I hit + l],l',Jt + 1]] 
= E,+ J2 ^jE[xj[t] - Xj\j + 1] I Ii[t + l],4jt + 1]], 

j<rR 
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which is at least Ei — n ^Wi by (0). Also, 

Ei = Y,PT{y[t + l]=y\Ii[t+l],l'^Jt+l]) J2 Wj{x,[t]-{xj[t]®yj)). 
y& jeLUBUR 

Each y with yj = for all j ^ i contributes at least (1 — Pn)'^Wi to the outer sum. 
All other strings y contribute at least —n~^Wi by ([6]). Now, putting it together, we find 
that 

E[<^>fix[t])-^jix[t + l])\I,[t + l]] 

= Pr(4 [t + 1] I Ut + l])E[^f{x[t]) - <i>fix[t + 1]) \Ut+ 1], 4 [t + 1]] 

,-3„ 



> Pr(4 [t + 1] I I^[t + l]){Ei - n-^Wi) 



> Pr(/' [t + 1] I I,[t + -pnTwi - n~^w^ - n-^w,) 



> (1 -Pn)"((l -Pn)"u'i -n ■^W^-n V)- 



Now, 2n ^ < e{l — Pn)^, so we have established Equation ([3]), as required. 
4.4 Case 4 

For this case, assume that the block L, immediately to the left of B, is long, and that i 
is in the leftmost miniblock of block B (which is short). 

Let Y be the set of n-bit binary strings so that, if y[t + 1] = y, then Ii[t + 1] occurs 
and At+i occurs (the move in step t + 1 is accepted). As in Case 3, At+i implies + 
Hence for every y £Y we have yj = ii j > ii or if j > i and Xj[t] = 1. Thus, if y G 
then, by the definition of ^t+i, we have 

0<Y.ajixj[t]-{xj[t](Byj)). (7) 

j<eL 

To derive an upper bound in the right-hand side of Equation ([7]) we split the summa- 
tion into three easily-bounded parts. The summation over j ^ L — {ri,} is equal to 
- Y.rL<j<iL:yj=i "i' summation over j € -B is at most Y.j&B:x^[t]=i,y^=i and the 
summation over j < is at most na^^ < aj/n. From ([7|) we thus have 

aj < ^ aj + ai/n. (8) 

jeL-{rL}:yj=l j(zB:Xj[t]=l,yj=l 

Define 

We will show that, for y £Y, 

Y,Wj{x,[t]-{xj[t](Byj))>^iy). 
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Start by breaking up the left-hand side as 

jeL-{rL}:yj=l jeB:Xj[t]=l,yj = l jeB:Xj[t]=0,yj=l j<rB 



Recah that for j £ L, we have wj = ^^^aj, whereas for j G B, we have 



where the final inequality uses the fact that B is short, that is, — rs ^ 'jn. 
Thus, the sum of the first two terms in ([9]) is at least 

Wr^ \ ^ VJ. 



'•i jGL-{r£}:s,,=l ''^ jeB:Xj[t]=l,y,=l 

and by dHl), this is at least 



which is at least 



(1 - K-^>^ - ^ V 



Cir> I- TX Qir- J- Ci' 



jeB-{i}:Xj [t]=l,yj = l 

Upper-bounding aj with ^p^WjK'^'^ in the last term, we find that ([9]) is at least 



(1 - K~^'')ai - K^' Yl 

jeB-{i}:Xj [t] = l,yj = l 



j&B:Xj[t\=0,yj = l j<rB 

Combining the summations, this is at least 



Upper-bounding ai with o^^, the first two terms are at least —(1 + ^ — K~^'^)wr^- Then 
upper-bounding Wrj^ with K"'^Wi, the whole thing is at least ^{y). 
We have shown that, for y £Y, 

iG[n] 
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Suppose that y is an n-bit binary string such that, if y\t + 1] = y, then ^t+i does not 
occur. In this case, we also have 

since ^{y) < 0. 

Now let y[t + 1] be random as constructed by the algorithm. Denote by y* the 
bit string that contains exactly one one-entry, namely the one on position i. Let P be 
the probability (conditional on + 1]) that y[t + 1] = y* ■ Now 

E[<^fix[t])-<^fix[t + l])\I,[t + l]] 

= Yl + 1] = y I + 1]) E "^A^M - i^M ® %■)) 

= Y,^T{y[t + l]=y\Ii[t + l])^{y) + 

Y + 1] = y I ii[t + M)[Y1 "^j^^M - i^jii] © vj)) - ^(y) 

y& \ie[n] 

= E[^{y[t + 1]) \h[t + 1]] + Y P^-(y[* + 1] = y I h[t + 1]) I E ""A^Ai] - i^At] © vj)) - 

\jG[n] 

> i?[vl/(y[t + 1]) \h[t + 1]] + P{-^{y*) + ^f{x[t]) - ^f{x[t] ® y*)) 

where the first inequality comes by ignoring terms y (z Y — {y*} (since these are non- 
negative) . 

Consider the first term, 

-(1 + i - K-"'^)K""^Wi = -{K^' - 1 + ^)wi. 

By our choice of 7, K'^'^ — 1 is very small (see the discussion at the end of Case 1). Since 
n > no. 

Now by the definitions of Ii[t + 1] and y* , P = {1 — Pn)"'~^~^, where C is the number of 
bits j > i such that Xj[t] = 1. Thus, P > {I — Pn)"' so 

{e/3){l-^r<{e/3)P 

We conclude that the first term is at least —{e/3)P'Wi. Using Lemma [TCI and Wig < 
K'^'^Wi, we obtain 
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Given the constraints on our parameters (see the discussion at the end of Case 1), each 
of the four summands, ^j^, ^^^) K'^'^'^^c and ^^r-, is at most (e/12)P. Thus, the 
second term, —K^'^ YlijKig j^i n'^ji ^1^° least —{e/2>)Pwi. In a similar way, we see 
that the third term, 

P{{l + l.-K-<^)K'<'^w^ + Wi), 
is at least Pwi{l — e/3). We conclude that 

E[^f{x[t]) - ^f{x[t + 1]) I Ii[t + 1]] > Pw^{l - e), 

which establishes Equation ([3]), as required. 

4.5 Case 5 

For this case, assume that i? is a long block. 

To the right of there might be a short block R, otherwise = 1 and we define 
R = {rs} to ease notation. To the left of B, there might be a short block L, otherwise 
£b = n and we define L = {£^1 to ease notation. 

Let Y be the set of n-bit binary strings so that, if y[t + 1] = y, then Ii[t + 1] occurs 
and At+i occurs (so the move in step t + 1 is accepted). As in Case 4, At+i implies 
I'^^ [t + 1]. Hence for every y £Y we have yj = for j > ii and for all j > i satisfying 
Xj[t] = 1. Thus, if ?/ G y, then, by the definition of ^t+i, we have 

o< Y,^ji^At]-i^j[t]®yj)) 

< ^ aj{xj[t] - {xj[t] ®yj)) + ain~^ 

rR<j<^L 

< ^ aj{xj[t] - {xj[t]®yj)) + J2 aj + ain~^. (10) 

rB<j<(^L j(^R;yj=l;Xj[t]=l 

We will use the fact that for j € L U i?, we have Wj = -^-^Oj since we are in the copy 
regime, whereas for j E R, we are in the damped regime, so we have 

Plugging this into ([TO]) . we obtain 



\j£R;yj = l;Xj[t]=l J j(zR;yj=l;Xj[t]=l 



tb 
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Let ^'(y) = —K^^ ^ Wj — WiU ^. From the above, 
J];u;j-(xj-[t]-(xj-[t]eyj)) 

jeR;yj=l;Xj[t]=l jeR;yj=l;xj[t]=0 j<rR;yj=l 

We have shown that, if y £Y (so I'^^ [t + 1] occurs), then 

Y Wj{xj[t] - {xj[t] Qyj))>^{y). 

Suppose now that y is an n-bit binary string such that, if y[t + 1] = y, then At+i 
does not occur. In this case, we also have 

Y - (^iW ®yj)) = 0>^{y), 

since ^'(y) < 0. 

Now let y[t + 1] be random as constructed by the algorithm. Denote by y* the 
bit string that contains exactly one "1" -entry, namely on position i. Let P be the 
probability (conditional on Ii[t + 1]) that y[t + 1] = y*. Now, as in Case 4, 

E[^x[t])-^f{x[t + l]) \h[t + l]] 

> E[^{y[t + 1]) I I,[t + 1]] + P{-^{y*) + <l>f{x[t]) - <^f{x[t] y*)) 

Using Lemma [15] and wi^^ < Wi, we obtain 

Since each of the summands, , K^'^^c^ ^^"^ '^"^ ^® most {e/5)P (for 

n > no), we have E[^f{x[t]) - ^f{x[t + 1]) | Ii[t + 1]] > Pwi{l - e), which gives 
Equation as required. 

The cases that we have just completed conclude the proof of Lemma [TE[ which was 
used in the proof of Lemma [TBI We are now ready to prove Theorem [71 

Proof of Theorem^ By Lemma 1161 there is a function z/(n) = 0(n) such that the 
family $ = of drift functions that we have constructed is i^-feasible for the EA. 

By Lemma [TT] this family of drift functions is piece-wise polynomial with respect to the 
EA. The result now follows from Lemma [9] (using Definition llOp . □ 
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5 A Simple Lower Bound 



The following theorem complements Theorem [71 showing that it cannot be improved by 
more than a constant factor. This extends Lemma 10 in [9] using the same proof idea. 

Theorem 19. Let c be a positive constant. Let c = max{l,c}. Let F be a family 
of linear objective functions over bit strings. Consider the (1+1) EA for minimising 
F with independent bit-mutation rate Pn = c/n. There is a constant uq such that, 
for any f £ F with n{f) > hq, the probability that the optimisation time is at most 
n{f) ln(n(/))/(2(c + 1)) is at most e^p{-n{ff^^^). 

Proof. Let no be any integer so that (1 — ^)"° > exp(— (c+1)). It is easy to see that such 
an no exists, since (1 — ;^)"'' converges, from below, to exp(— c), as n — > cxd. Consider 
an input f G F with n(/) > no- Let n = n(/) and let 

T = — -nlnn. 

2(c + l) 

The probability that a particular bit position is not touched by any mutation step 
during T iterations is at least 

(1 - Pnf > (1 - c/nf > exp(-(c + l)r/n) = n'^^'^. 

By a Chernoff bound, the probability that the initial solution x (which is chosen 
uniformly at random from {0, 1}") has at least n/3 bit positions that are one is at least 
1 — exp(— n/36). The probability that all of these bits are touched in T mutation steps 
is at most (1 - n-i/2)"/3 < exp{-{l/3)n^/^). 

Thus, the probability that the optimum is found in T steps is at most exp(— n/36) + 
exp(-(l/3)ni/2). □ 



6 Conclusion 

Let c be a positive constant. Let F be a family of linear objective functions over bit 
strings. Theorem [7] shows that the (1+1) EA for minimising F with independent bit- 
mutation rate p„ = c/n has expected optimisation time 0(n(/) logn(/)). The proof of 
the theorem constructs a feasible family of drift functions for the EA that is piece-wise 
polynomial. The construction of the drift functions depends on the relevant objective 
functions. By reproving a classical drift theorem, we also show that our bound on the 
expected optimisation time also holds with high probability. This version of the drift 
theorem makes it easy to extend a number of other classical bounds stemming from 
drift or "expected multiplicative weight decrease" arguments to also hold with high 
probability, instead of only with expectation (see [1]). We expect this version of the 
drift theorem to become a useful tool in the theory of evolutionary algorithms. 
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